Introduction


Part I: Working with an Undirected Network


For the LondonGangNet network, do the following:

  1. Plot the LondonGangNet and plot a random network with the same number of edges.
  2. Based on the plots, how does the LondonGangNet differ from the random network?
  3. Estimate an ERGM where ties form at random. Interpret the coefficient.
  4. Examine and describe the relationship between degree and the attribute Prison.
  5. Estimate an ERGM where gang members differ in their degree based on the attribute Prison. Interpret the coefficient.
  6. Estimate an ERGM where gang members differ in their degree based on the attribute Convictions. Interpret the coefficient.
  7. Examine the mixing matrix for Birthplace. What is the interpretation of this matrix?
  8. Estimate an ERGM where gang members of the same Birthplace are more likely to form ties.
  9. What is the probability of a tie between gang members who are from the same Birthplace?
  10. Estimate an ERGM where gang members of similar Age are more likely to form ties.
  11. What is the probability of a tie between gang members who are the same Age?
  12. Simulate a network from the Prison or (and?) Convictions ERGMs (i.e. #5 or/and #6).
  13. Plot the simulated network from #12 and the LondonGangNet, identifying nodes based on Prison or Conviction in both plots.
  14. Simulate a network from the Birthplace or (and?) Age ERGMs (i.e. #8 or/and #10).
  15. Plot the simulated network from #14 and the LondonGangNet, identifying nodes based on Birthplace or Age in both plots.
  16. Evaluate the goodness of fit for the models in #5, #6, #8, or #10.
  17. Which model provides the best fit to the data? How do you know?


Part II: Working with a Directed Network


For the trustNet network, do the following:

  1. Plot the trustNet and plot a random network with the same number of edges.
  2. Based on the plots, how does the trustNet differ from the random network?
  3. Estimate an ERGM where ties form at random. Interpret the coefficient.
  4. Examine and describe the relationship between degree and the attribute White.
  5. Examine and describe the relationship between degree and the attribute YearsOnUnit.
  6. Estimate an ERGM where individuals differ in their degree based on the attribute White. Interpret the coefficient.
  7. Estimate an ERGM where individuals differ in their degree based on the attribute YearsOnUnit. Interpret the coefficient.
  8. Examine the mixing matrix for White. What is the interpretation of this matrix?
  9. Estimate an ERGM where are more likely to form ties based on White (i.e. homophily for the attribute White).
  10. What is the probability of a tie between two individuals who are both white?
  11. What is the probability of a tie between two individuals who are both non-white?
  12. Estimate an ERGM where ties are reciprocated. Interpret the coefficient.
  13. Simulate a network from the ERGM in #6, #7, #9, or (and?) #12.
  14. Plot the simulated network from #13 and the trustNet, identifying nodes based on White or (and?) YearsOnUnit in both plots.
  15. Evaluate the goodness of fit for the models in #6, #7, #9, or (and?) #12.
  16. Which model provides the best fit to the data? How do you know?



Part I: Working with an Undirected Network


First, let’s read in the LondonGangNet network. This is stored as an .rds document in the data folder on the SNA Textbook site. We will use the readRDS() function, with the file path, to load the file. Since we are calling a url, we need to use the url() function as well.

Finally, we need to make sure the sna and network packages are loaded, using library(), so that R recognizes the LondonGangNet object as one of class network.


# load the libraries we need
library( sna )
library( network )
library( ergm )

# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-london-gang-net.rds"
LondonGangNet <- readRDS( url(loc ) )

# look at the network
LondonGangNet
##  Network attributes:
##   vertices = 54 
##   directed = FALSE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 315 
##     missing edges= 0 
##     non-missing edges= 315 
## 
##  Vertex attribute names: 
##     Age Arrests Birthplace Convictions Music Prison Ranking Residence vertex.names 
## 
## No edge attributes


1. Plot the LondonGangNet and plot a random network with the same number of edges.


First, let’s set up our random network. Recall that we want to generate a random network with the same number of edges and density as the LondonGangNet object.

# set the seed to reproduce these results
set.seed( 605 )

# generate the random graph
random.graph <- rgraph(
  dim( as.matrix( LondonGangNet ) )[1],
  1,                   
  tprob = sum( as.matrix( LondonGangNet ) )/2 / ( dim( as.matrix( LondonGangNet ) )[1] *( dim( as.matrix( LondonGangNet ) )[1] - 1 ) / 2 ) ,
  mode = "graph"
  )

# now coerce the random graph to a network object
random.net <- as.network( random.graph, directed = FALSE )
# Set the coordinates
set.seed( 605 )
coords <- gplot( LondonGangNet )

Now, we can plot them both:

# set the margins
par( mfrow=c( 1,2 ), 
     mar=c( 0.1, 0.5, 2, 0.5 ) )

# set the seed
set.seed( 605 )

# create the first plot
gplot( 
  LondonGangNet, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col="#c78c71",
  coord = coords,
  main = "London Gang Network"
  )

# create the second plot
gplot( 
  random.net, 
  gmode = "graph",
  edge.col = "grey40", 
  vertex.col="#069e6e",
  main = "Random network"
  )


2. Based on the plots, how does the LondonGangNet differ from the random network?


The most striking difference is in the distribution of the edges over the nodes. Even though the number of edges are the same in each graph, the random graph looks, well, more random, whereas the LondonGangNet shows a core group with many edges and some peripheral nodes with fewer edges. Put differently, the random graph does not reproduce the variation in the degree distribution that is observable in the plot of the LondonGangNet.


3. Estimate an ERGM where ties form at random. Interpret the coefficient.


# estimate the model
summary( 
  edge.indep.LGN <- ergm( LondonGangNet ~ edges ) 
)
## Call:
## ergm(formula = LondonGangNet ~ edges)
## 
## Maximum Likelihood Results:
## 
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -1.2649     0.0638      0  -19.83   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1508  on 1430  degrees of freedom
##  
## AIC: 1510  BIC: 1516  (Smaller is better. MC Std. Err. = 0)
# calculate the probability of a tie
plogis( edge.indep.LGN$coefficients[1] * 1 )
##     edges 
## 0.2201258

The probability of a tie in this model is 0.22. This is the same as the density of the network, 0.22


4. Examine and describe the relationship between degree and the attribute Prison.


The mean degree for those who have been to prison is 12.71, while the mean degree for those who have not been to prison is 10.83. This means that individuals who have been to prison (i.e. LondonGangNet %v% "Prison" == 1) have, on average, roughly two more edges than those who have not been to prison.


5. Estimate an ERGM where gang members differ in their degree based on the attribute Prison. Interpret the coefficient.


# estimate the model
prison.LGN <- ergm( 
  LondonGangNet ~ edges 
  + nodefactor( "Prison" )
  ) 

# summarize the model
summary( prison.LGN )
## Call:
## ergm(formula = LondonGangNet ~ edges + nodefactor("Prison"))
## 
## Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -1.45708    0.10762      0 -13.539   <1e-04 ***
## nodefactor.Prison.1  0.20948    0.09158      0   2.287   0.0222 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1503  on 1429  degrees of freedom
##  
## AIC: 1507  BIC: 1518  (Smaller is better. MC Std. Err. = 0)

The sign of the coefficient is positive, indicating that the probability of a tie increases if we toggle from Prison being 0 to Prison being a 1.

More specifically, we can calculate the predicted probability of a tie between i and j if they both have been to prison using the coefficient of 0.21:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{prison} \times \delta_{prison})\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-1.46 \times 1) + (0.21 \times 2))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.46 + 0.42)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.04) = 0.26\)


The value of 0.26 is the predicted probability of a tie.


6. Estimate an ERGM where gang members differ in their degree based on the attribute Convictions. Interpret the coefficient.


# estimate the model
convictions.LGN <- ergm( 
  LondonGangNet ~ edges 
  + nodecov( "Convictions" )
  ) 

# summarize the model
summary( convictions.LGN )
## Call:
## ergm(formula = LondonGangNet ~ edges + nodecov("Convictions"))
## 
## Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -1.38274    0.12470      0 -11.088   <1e-04 ***
## nodecov.Convictions  0.01385    0.01245      0   1.113    0.266    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1507  on 1429  degrees of freedom
##  
## AIC: 1511  BIC: 1522  (Smaller is better. MC Std. Err. = 0)

The sign of the coefficient is positive, but not significantly different from zero. This indicates that the probability of a tie, based on changing Convictions from 0 to 1 does not differ from what we may observe if ties formed at random.


7. Examine the mixing matrix for Birthplace. What is the interpretation of this matrix?


mixingmatrix( LondonGangNet, "Birthplace" )
##    1  2  3  4
## 1 35 38 48 26
## 2 38 23 48 16
## 3 48 48 46 27
## 4 26 16 27  8

The diagonal represents homophilous dyads. The off-diagonal represents heterophilous dyads. Based on a visual inspection of the table, there does not appear to be a tendency toward homophily based on Birthplace.


8. Estimate an ERGM where gang members of the same Birthplace are more likely to form ties.


# estimate the model using the nodematch term
homophily.birthplace.LGN <- ergm( 
  LondonGangNet ~ edges 
  + nodematch( "Birthplace" )
  ) 

# print the model output
summary( homophily.birthplace.LGN )
## Call:
## ergm(formula = LondonGangNet ~ edges + nodematch("Birthplace"))
## 
## Maximum Likelihood Results:
## 
##                      Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges                -1.37764    0.07854      0 -17.541  < 1e-04 ***
## nodematch.Birthplace  0.35634    0.13532      0   2.633  0.00846 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1502  on 1429  degrees of freedom
##  
## AIC: 1506  BIC: 1516  (Smaller is better. MC Std. Err. = 0)


9. What is the probability of a tie between gang members who are from the same Birthplace?


Using the coefficient of round( homophily.birthplace.LGN$coefficients[2], 2 ), the predicted probability of an edge between nodes with the same Birthplace is:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + (\theta_{Birthplace} \times \delta_{Birthplace}))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.38 \times 1 + 1.5 \times 1)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.38 + 1.5)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(0.12) = 0.53\)


Note that the model above tests for uniform homophily. That is, that homophily is the same over the categories of Birthplace. We can test for differential homophily by setting the diff= argument to TRUE in the nodematch() term.

# estimate the model using the nodematch term
homophily.d.LGN <- ergm( 
  LondonGangNet ~ edges 
  + nodematch( "Birthplace", diff = TRUE )
  ) 

# print the model output
summary( homophily.d.LGN )
## Call:
## ergm(formula = LondonGangNet ~ edges + nodematch("Birthplace", 
##     diff = TRUE))
## 
## Maximum Likelihood Results:
## 
##                        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges                  -1.37764    0.07854      0 -17.541  < 1e-04 ***
## nodematch.Birthplace.1  1.49900    0.25884      0   5.791  < 1e-04 ***
## nodematch.Birthplace.2  0.75193    0.27000      0   2.785  0.00535 ** 
## nodematch.Birthplace.3 -0.23180    0.17960      0  -1.291  0.19682    
## nodematch.Birthplace.4  1.51117    0.52347      0   2.887  0.00389 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1459  on 1426  degrees of freedom
##  
## AIC: 1469  BIC: 1495  (Smaller is better. MC Std. Err. = 0)

Looking at the model output we see that there is considerable differences across the categories in terms of homophily.


10. Estimate an ERGM where gang members of similar Age are more likely to form ties.


# estimate the model using the absdiff term
homophily.age.LGN <- ergm( 
  LondonGangNet ~ edges 
  + absdiff( "Age" )
  ) 

# print the model output
summary( homophily.age.LGN )
## Call:
## ergm(formula = LondonGangNet ~ edges + absdiff("Age"))
## 
## Maximum Likelihood Results:
## 
##             Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges       -0.97013    0.09910      0  -9.789  < 1e-04 ***
## absdiff.Age -0.10632    0.02901      0  -3.665 0.000247 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 1984  on 1431  degrees of freedom
##  Residual Deviance: 1494  on 1429  degrees of freedom
##  
## AIC: 1498  BIC: 1509  (Smaller is better. MC Std. Err. = 0)


11. What is the probability of a tie between gang members who are the same Age?


Using the coefficient of -0.11, if two individuals are the same Age, the predicted probability of a tie between i and j is:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{age} \times \delta_{age})\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-0.97 \times 1) + (-0.11 \times 0))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.97 + 0)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-0.97) = 0.27\)


As the negative coefficient indicates, the predicated probability of an edge between two nodes decreases as the difference in their age increases.


12. Simulate a network from the Prison or (and?) Convictions ERGMs (i.e. #5 or/and #6).


# simulate from the prison ergm
sim.prison.LGN <- simulate(
  prison.LGN,             
  nsim=1,                 
  seed = 605               
)

# simulate from the conviction ergm
sim.convictions.LGN <- simulate(
  convictions.LGN,             
  nsim=1,                 
  seed = 605               
)


13. Plot the simulated network from #12 and the LondonGangNet, identifying nodes based on Prison or Conviction in both plots.


# use the rescale function for the plot
rescale <- function( nchar, low, high ){
  min_d <- min( nchar )
  max_d <- max( nchar )
  rscl  <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
  rscl
}


# set the margins
par( mfrow=c( 2,2 ), 
     mar=c( 0.1, 0.5, 2, 0.5 ) )

# set the seed
set.seed( 605 )

# create the first plot
gplot( 
  LondonGangNet, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col="#f71505",
  vertex.sides = LondonGangNet %v% "Prison" + 3,
  vertex.cex = rescale( LondonGangNet %v% "Convictions", 0.5, 2 ),
  coord = coords,
  main = "London Gang Network"
  )

# create the second plot for the simulated network
gplot( 
  sim.prison.LGN, 
  gmode = "graph",
  edge.col = "grey40", 
  vertex.col="#11adf5",
  vertex.sides = sim.prison.LGN %v% "Prison" + 3,
  main = "Network Simulated from\n Prison ERGM"
  )

# create the third plot for the simulated network
gplot( 
  sim.convictions.LGN, 
  gmode = "graph",
  edge.col = "grey40", 
  vertex.col="#f2adf3",
  vertex.cex = rescale( sim.convictions.LGN %v% "Convictions", 0.5, 2 ),
  main = "Network Simulated from\n Convictions ERGM"
  )


14. Simulate a network from the Birthplace or (and?) Age ERGMs (i.e. #8 or/and #10).


# simulate from the Birthplace ergm
sim.homophily.birthplace.LGN <- simulate(
  homophily.birthplace.LGN,             
  nsim=1,                 
  seed = 605               
)

# simulate from the Age ergm
sim.homophily.age.LGN <- simulate(
  homophily.age.LGN,             
  nsim=1,                 
  seed = 605               
)


15. Plot the simulated network from #14 and the LondonGangNet, identifying nodes based on Birthplace or Age in both plots.


# set the margins
par( mfrow=c( 2,2 ), 
     mar=c( 0.1, 0.5, 2, 0.5 ) )

# set the seed
set.seed( 605 )

# create the first plot
gplot( 
  LondonGangNet, 
  gmode = "graph",
  edge.col="grey40", 
  vertex.col = LondonGangNet %v% "Birthplace",
  vertex.cex = rescale( LondonGangNet %v% "Age", 0.5, 2 ),
  coord = coords,
  main = "London Gang Network"
  )

# create the second plot for the simulated network
gplot( 
  sim.homophily.birthplace.LGN, 
  gmode = "graph",
  edge.col = "grey40", 
  vertex.col = sim.homophily.birthplace.LGN %v% "Birthplace",
  main = "Network Simulated from\n Birthplace ERGM"
  )

# create the third plot for the simulated network
gplot( 
  sim.homophily.age.LGN, 
  gmode = "graph",
  edge.col = "grey40", 
  vertex.cex = rescale( sim.homophily.age.LGN %v% "Age", 0.5, 2 ),
  main = "Network Simulated from\n Age ERGM"
  )


16. Evaluate the goodness of fit for the models in #5, #6, #8, or #10.


Prison ERGM GOF


# simulate the networks
prison.LGN.gof <- gof(
  prison.LGN, GOF = ~degree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 2,2 ) )

# plot the results
plot( prison.LGN.gof )

This model does a poor job reproducing the degree distribution and the edgewise shared partner distribution.

Convictions ERGM GOF


# simulate the networks
convictions.LGN.gof <- gof(
  convictions.LGN, GOF = ~degree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 2,2 ) )

# plot the results
plot( convictions.LGN.gof )

This model also does a poor job reproducing the degree distribution and the edgewise shared partner distribution.


Birthplace ERGM GOF


# simulate the networks
homophily.birthplace.LGN.gof <- gof(
  homophily.birthplace.LGN, GOF = ~degree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 2,2 ) )

# plot the results
plot( homophily.birthplace.LGN.gof )

This model also does a poor job reproducing the degree distribution and the edgewise shared partner distribution. In particular, the edgewise shared partner distribution is very poorly recreated.


Age ERGM GOF


# simulate the networks
homophily.age.LGN.gof <- gof(
  homophily.age.LGN, GOF = ~degree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 2,2 ) )

# plot the results
plot( homophily.age.LGN.gof )

Similar to what we saw above. The main reason we are seeing this poor fits is that we do not have a term to capture the edgewise shared partners. Can you think of a term the model could include that would accomplish this?


17. Which model provides the best fit to the data? How do you know?


None do a good job representing the data.




Part II: Working with a Directed Network


First, let’s read in the trustNet network. This is stored as an .rds document in the data folder on the SNA Textbook site. We will use the readRDS() function, with the file path, to load the file. Since we are calling a url, we need to use the url() function as well.

Finally, we need to make sure the sna and network packages are loaded, using library(), so that R recognizes the trustNet object as one of class network.


# clear the workspace since we may recycle objects below
rm( list = ls() )

# load the libraries we need
library( sna )
library( network )
library( ergm ) 

# define the path location for the file
loc <- "https://github.com/jacobtnyoung/sna-textbook/raw/main/data/data-WOPINS-s1-trust-net.rds"
trustNet <- readRDS( url( loc ) )

# print it out to look at it
trustNet
##  Network attributes:
##   vertices = 131 
##   directed = TRUE 
##   hyper = FALSE 
##   loops = FALSE 
##   multiple = FALSE 
##   bipartite = FALSE 
##   total edges= 515 
##     missing edges= 0 
##     non-missing edges= 515 
## 
##  Vertex attribute names: 
##     vertex.names White YearsOnUnit 
## 
## No edge attributes


1. Plot the trustNet and plot a random network with the same number of edges.


# set the seed to reproduce these results
set.seed( 605 )

# generate the random graph
random.graph <- rgraph(
  dim( as.matrix( trustNet ) )[1],
  1,                   
  tprob = sum( as.matrix( trustNet ) ) / ( dim( as.matrix( trustNet ) )[1] *( dim( as.matrix( trustNet ) )[1] - 1 ) ) ,
  mode = "digraph"
  )

# now coerce the random graph to a network object
random.net <- as.network( random.graph, directed = TRUE )
# Set the coordinates
coords <- gplot( trustNet )

Now, we can plot them both:

# set the margins
par( mfrow=c( 1,2 ), 
     mar=c( 0.1, 0.5, 2, 0.5 ) )

# create the first plot
gplot( 
  trustNet, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#faa700",
  coord = coords,
  main = "Trust Network"
  )

# create the second plot
gplot( 
  random.net, 
  gmode = "digraph",
  edge.col = "grey40", 
  vertex.col="#069e6e",
  main = "Random network"
  )


2. Based on the plots, how does the trustNet differ from the random network?


Similar to what we saw above, the transitivity in the trustNet network is not reproduced in the random graph.


3. Estimate an ERGM where ties form at random. Interpret the coefficient.


# estimate the model
summary( 
  edge.indep.TN <- ergm( trustNet ~ edges ) 
)
## Call:
## ergm(formula = trustNet ~ edges)
## 
## Maximum Likelihood Results:
## 
##       Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges -3.46786    0.04474      0  -77.51   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4618  on 17029  degrees of freedom
##  
## AIC: 4620  BIC: 4628  (Smaller is better. MC Std. Err. = 0)
# calculate the probability of a tie
plogis( edge.indep.TN$coefficients[1] * 1 )
##      edges 
## 0.03024075

The probability of a tie in this model is 0.03. This is the same as the density of the network, 0.03


4. Examine and describe the relationship between degree and the attribute White.


Indegree

The mean indegree for those who are White is 3.44, while the mean indegree for those who are not White is 4.59. This means that individuals who are White (i.e. trustNet %v% "White" == 1) receive, on average, roughly 1 less edge compared to those who are not White.


Outdegree

The mean outdegree for those who are White is 4.17, while the mean outdegree for those who are not White is 3.61. This means that individuals who are White (i.e. trustNet %v% "White" == 1) send, on average, more edges (though it is small) compared to those who are not White.


5. Examine and describe the relationship between degree and the attribute YearsOnUnit.


Indegree

The mean indegree for those who are at or below the median for YearsOnUnit is 3.24, while the mean indegree for those who are above the median for YearsOnUnit is 4.63. This means that individuals who have been on the unit longer receive more trust nominations.


Outdegree

The mean outdegree for those who are at or below the median for YearsOnUnit is 3.06, while the mean outdegree for those who are above the median for YearsOnUnit is 4.82. This means that individuals who have been on the unit longer also send more trust nominations.


6. Estimate an ERGM where individuals differ in their degree based on the attribute White. Interpret the coefficient.

We could approach this a few different ways. One is to estimate a nodefactor term where we do not differentiate between incoming and outgoing ties.

# estimate the model
white.TN <- ergm( 
  trustNet ~ edges 
  + nodefactor( "White" )
  ) 

# summarize the model
summary( white.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodefactor("White"))
## 
## Maximum Likelihood Results:
## 
##                    Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges              -3.38137    0.08387      0  -40.32   <1e-04 ***
## nodefactor.White.1 -0.07671    0.06391      0   -1.20     0.23    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4616  on 17028  degrees of freedom
##  
## AIC: 4620  BIC: 4636  (Smaller is better. MC Std. Err. = 0)


A second approach would be to estimate separate effects for degree differences by White using the nodeifactor and nodeofactor terms. Let’s take a look at that model.

# estimate the model
white2.TN <- ergm( 
  trustNet ~ edges 
  + nodeifactor( "White" )
  + nodeofactor( "White" )
  ) 

# summarize the model
summary( white2.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodeifactor("White") + nodeofactor("White"))
## 
## Maximum Likelihood Results:
## 
##                     Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -3.39577    0.08461      0 -40.132  < 1e-04 ***
## nodeifactor.White.1 -0.29630    0.08953      0  -3.310 0.000934 ***
## nodeofactor.White.1  0.14807    0.09163      0   1.616 0.106115    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4604  on 17027  degrees of freedom
##  
## AIC: 4610  BIC: 4633  (Smaller is better. MC Std. Err. = 0)

First, the nodeifactor term is negative and significantly different from zero, indicating that the probability of receiving a tie decreases if an individual is White compared to an individual who is not White.

More specifically, we can calculate the predicted probability of i sending a tie to j if j is White using the coefficient of -0.3:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{White} \times \delta_{White})\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-3.39 \times 1) + (-0.3 \times 1))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.39 + -0.3)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.69) = 0.24\)


The value of 0.02 is the predicted probability of a tie.

The nodeofactor term is not significantly different from zero, indicating that there is not a substantive difference in outdegree based on the attribute White.


7. Estimate an ERGM where individuals differ in their degree based on the attribute YearsOnUnit. Interpret the coefficient.


As with White, we could model the effects of YearsOnUnit by either not considering the directionality of the ties or we could incorporate that information. Note that since YearsOnUnit is continuous, we use the nodecov terms.


# estimate the model
you.TN <- ergm( 
  trustNet ~ edges 
  + nodecov( "YearsOnUnit" )
  ) 

# summarize the model
summary( you.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodecov("YearsOnUnit"))
## 
## Maximum Likelihood Results:
## 
##                      Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges               -3.898671   0.079856      0 -48.821   <1e-04 ***
## nodecov.YearsOnUnit  0.052285   0.007261      0   7.201   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4569  on 17028  degrees of freedom
##  
## AIC: 4573  BIC: 4589  (Smaller is better. MC Std. Err. = 0)

Now let’s separate them:

# estimate the model
you2.TN <- ergm( 
  trustNet ~ edges 
  + nodeicov( "YearsOnUnit" )
  + nodeocov( "YearsOnUnit" )
  ) 

# summarize the model
summary( you2.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodeicov("YearsOnUnit") + nodeocov("YearsOnUnit"))
## 
## Maximum Likelihood Results:
## 
##                      Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges                -3.89869    0.07987      0 -48.815   <1e-04 ***
## nodeicov.YearsOnUnit  0.05809    0.01009      0   5.757   <1e-04 ***
## nodeocov.YearsOnUnit  0.04635    0.01032      0   4.493   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4568  on 17027  degrees of freedom
##  
## AIC: 4574  BIC: 4598  (Smaller is better. MC Std. Err. = 0)

First, the nodeicov term is positive and significantly different from zero, indicating that the probability of receiving a tie increases as the number of years the individual has spent on the unit increases.

More specifically, we can calculate the predicted probability of i sending a tie to j if j has spent the mean number of years on the unit, which is 3.72, using the coefficient of 0.06:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{YearsOnUnit} \times \delta_{YearsOnUnit})\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-3.89 \times 1) + (0.06 \times 3.72))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.89 + 0.22)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.67) = 0.25\)


The value of 0.02 is the predicted probability of a tie.


Second, the nodeocov term is also positive and significantly different from zero, indicating that the probability of sending a tie increases as the number of years the individual has spent on the unit increases.

More specifically, we can calculate the predicted probability of i sending a tie to j if i has spent the mean number of years on the unit (note the difference), which is 3.72, using the coefficient of 0.05:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + \theta_{YearsOnUnit} \times \delta_{YearsOnUnit})\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic((-3.89 \times 1) + (0.05 \times 3.72))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.89 + 0.19)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.7) = 0.24\)


The value of 0.02 is the predicted probability of a tie.


8. Examine the mixing matrix for White. What is the interpretation of this matrix?


mixingmatrix( trustNet, "White" )
##      To
## From    0   1 Sum
##   0   131  71 202
##   1   126 187 313
##   Sum 257 258 515


Based on the mixing matrix, we can see that there are more homophilous trust ties (the diagonal) compared to heterophilous ties. Interestingly, this is mainly due to non-White individuals sending fewer ties to White individuals.


9. Estimate an ERGM where are more likely to form ties based on White (i.e. homophily for the attribute White).


# estimate the model
homophily.TN <- ergm( 
  trustNet ~ edges 
  + nodematch( "White" )
  ) 

# summarize the model
summary( homophily.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodematch("White"))
## 
## Maximum Likelihood Results:
## 
##                 Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges           -3.72905    0.07210      0 -51.722   <1e-04 ***
## nodematch.White  0.46565    0.09199      0   5.062   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4591  on 17028  degrees of freedom
##  
## AIC: 4595  BIC: 4611  (Smaller is better. MC Std. Err. = 0)


10. What is the probability of a tie between two individuals who are both white?


Using the coefficient of round( homophily.TN$coefficients[2], 2 ), the predicted probability of an edge between nodes who are both White or nodes who are both non-White is:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + (\theta_{homophily} \times \delta_{homophily}))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.73 \times 1 + 0.47 \times 1)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.73 + 0.47)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.26) = 0.037\)

The value of 0.037 is the predicted probability of a tie.


11. What is the probability of a tie between two individuals who are both non-white?

In the model above, it is the same. But, if we incorporate differential homophily, then we have a different coefficient.


# estimate the model
homophily2.TN <- ergm( 
  trustNet ~ edges 
  + nodematch( "White", diff=TRUE )
  ) 

# summarize the model
summary( homophily2.TN )
## Call:
## ergm(formula = trustNet ~ edges + nodematch("White", diff = TRUE))
## 
## Maximum Likelihood Results:
## 
##                   Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges              -3.7290     0.0721      0 -51.722  < 1e-04 ***
## nodematch.White.0   0.6150     0.1148      0   5.359  < 1e-04 ***
## nodematch.White.1   0.3729     0.1036      0   3.599 0.000319 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4587  on 17027  degrees of freedom
##  
## AIC: 4593  BIC: 4616  (Smaller is better. MC Std. Err. = 0)

Using the coefficient of round( homophily2.TN$coefficients[2], 2 ), the predicted probability of an edge between nodes who are both non-White is:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + (\theta_{diffhomophily} \times \delta_{diffhomophily}))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.73 \times 1 + 0.62 \times 1)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.73 + 0.62)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.11) = 0.043\)

The value of 0.043 is the predicted probability of a tie.


12. Estimate an ERGM where ties are reciprocated. Interpret the coefficient.


# estimate the model by adding the mutual term
recip.TN <- ergm( 
  trustNet ~ edges
  + mutual, 
  
  control = control.ergm(
    seed = 605 ) # here we use the control argument to set the seed to reproduce results
  ) 

summary( recip.TN )
## Call:
## ergm(formula = trustNet ~ edges + mutual, control = control.ergm(seed = 605))
## 
## Monte Carlo Maximum Likelihood Results:
## 
##        Estimate Std. Error MCMC % z value Pr(>|z|)    
## edges  -3.74246    0.05186      0  -72.17   <1e-04 ***
## mutual  2.68741    0.16686      0   16.11   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##      Null Deviance: 23609  on 17030  degrees of freedom
##  Residual Deviance:  4422  on 17028  degrees of freedom
##  
## AIC: 4426  BIC: 4441  (Smaller is better. MC Std. Err. = 1.495)


Using the coefficient of 2.69, the predicted probability of a trust tie between i and j if a trust tie exist between j and i is:

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(\theta_{edges} \times \delta_{edges} + (\theta_{reciprocity} \times \delta_{reciprocity}))\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.74 \times 1 + 2.69 \times 1)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-3.74 + 2.69)\)

\(\Bigg(P \Bigg(Y_{ij}=1 \> | \> n \> actors, Y_{ij}^C\Bigg) \Bigg) = logistic(-1.05) = 0.26\)


13. Simulate a network from the ERGM in #6, #7, #9, or (and?) #12.


# simulate from the white ergm
sim.white2.TN <- simulate(
  white2.TN,             
  nsim=1,                 
  seed = 605               
)

# simulate from the years on unit ergm
sim.you2.TN <- simulate(
  you2.TN,             
  nsim=1,                 
  seed = 605               
)

# simulate from the homophily ergm
sim.homophily2.TN <- simulate(
  homophily2.TN,             
  nsim=1,                 
  seed = 605               
)

# simulate from the reciprocity ergm
sim.recip.TN <- simulate(
  recip.TN,             
  nsim=1,                 
  seed = 605               
)


14. Plot the simulated network from #13 and the trustNet, identifying nodes based on White or (and?) YearsOnUnit in both plots.


First, we want to create a matrix of colored edges for our reciprocity ergm.

# create the symmetric matrix and colors
sympiMat <- symmetrize( trustNet, rule = "strong" )
sympiMatCols <- sympiMat
sympiMatCols[sympiMat == 0] <- "grey80"
sympiMatCols[sympiMat == 1] <- "#f71505" # set a color for mutual ties

# create the symmetric matrix for the simulation
sim.sympiMat <- symmetrize( sim.recip.TN, rule = "strong" )
sim.sympiMatCols <- sim.sympiMat
sim.sympiMatCols[sim.sympiMat == 0] <- "grey80"
sim.sympiMatCols[sim.sympiMat == 1] <- "#f78b07" # set a color for mutual ties

Now, set up our plots:

# use the rescale function for the plot
rescale <- function( nchar, low, high ){
  min_d <- min( nchar )
  max_d <- max( nchar )
  rscl  <- ( ( high - low )*( nchar - min_d ) ) / ( max_d - min_d ) + low
  rscl
}


# set the margins
par( mfrow=c( 3,2 ), 
     mar=c( 0.1, 0.5, 2, 0.5 ) )

# set the seed
set.seed( 605 )

# create the first plot
gplot( 
  trustNet, 
  gmode = "digraph",
  edge.col=sympiMatCols, 
  vertex.col="#f71505",
  vertex.sides = trustNet %v% "White" + 3,
  vertex.cex = rescale( trustNet %v% "YearsOnUnit", 0.5, 2 ),
  coord = coords,
  main = "Trust Network"
  )


# create the second plot
gplot( 
  sim.white2.TN, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#0a9cf7",
  vertex.sides = sim.white2.TN %v% "White" + 3,
  #vertex.cex = rescale( sim.white2.TN %v% "YearsOnUnit", 0.5, 2 ),
  coord = coords,
  main = "Simulation from White ERGM"
  )


# create the third plot
gplot( 
  sim.you2.TN, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#c3de12",
  #vertex.sides = sim.you2.TN %v% "White" + 3,
  vertex.cex = rescale( sim.you2.TN %v% "YearsOnUnit", 0.5, 2 ),
  coord = coords,
  main = "Simulation from Years\n on Unit ERGM"
  )


# create the fourth plot
gplot( 
  sim.homophily2.TN, 
  gmode = "digraph",
  edge.col="grey40", 
  vertex.col="#f707c3",
  vertex.sides = sim.homophily2.TN %v% "White" + 3,
  #vertex.cex = rescale( sim.homophily2.TN %v% "YearsOnUnit", 0.5, 2 ),
  coord = coords,
  main = "Simulation from Homophily ERGM"
  )


# create the fifth plot
gplot( 
  sim.recip.TN, 
  gmode = "digraph",
  edge.col=sim.sympiMatCols, 
  vertex.col="#f78b07",
  #vertex.sides = sim.recip.TN %v% "White" + 3,
  #vertex.cex = rescale( sim.recip.TN %v% "YearsOnUnit", 0.5, 2 ),
  coord = coords,
  main = "Simulation from Reciprocity ERGM"
  )


15. Evaluate the goodness of fit for the models in #6, #7, #9, or #12.


White GOF

# simulate the networks
white2.TN.gof <- gof(
  white2.TN, GOF = ~idegree + odegree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 3,2 ) )

# plot the results
plot( white2.TN.gof )


Years on Unit GOF

# simulate the networks
you2.TN.gof <- gof(
  you2.TN, GOF = ~idegree + odegree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 3,2 ) )

# plot the results
plot( you2.TN.gof )


Homophily GOF

# simulate the networks
homophily2.TN.gof <- gof(
  homophily2.TN, GOF = ~idegree + odegree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 3,2 ) )

# plot the results
plot( homophily2.TN.gof )


Reciprocity GOF

# simulate the networks
recip.TN.gof <- gof(
  recip.TN, GOF = ~idegree + odegree + espartners + distance,           
  verbose = TRUE,         
  control = control.gof.ergm( seed = 605 )
)

# set the plot pane
par( mfrow = c( 3,2 ) )

# plot the results
plot( recip.TN.gof )


16. Which model provides the best fit to the data? How do you know?


Look at the plots!




Back to SAND main page


Please report any needed corrections to the Issues page. Thanks!